Augmenting the power of LSI in text retrieval: Singular value rescaling

نویسندگان

  • Hua Yan
  • William I. Grosky
  • Farshad Fotouhi
چکیده

This paper presents an analysis of several different LSI (latent semantic indexing) query approaches and proposes a novel rescaling technique, namely singular value rescaling (SVR). Experiments on a standardized TREC data set confirmed the effectiveness of SVR, showing an improvement ratio of 5.9% over the best conventional LSI query approach. In addition, we also compared SVR with another scaling technique in text retrieval called iterative residual rescaling (IRR). Experiments on TREC data set show that SVR performs better than IRR. 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an opti...

متن کامل

An Application of LSI and M-tree in Image Retrieval

When dealing with image databases, we often need to solve the problem of how to retrieve a desired set of images effectively and efficiently. As a representation of images, there are commonly used some high-dimensional vectors of extracted features, since in such a way the content-based image retrieval is turned into a geometric-search problem. In this article we present a case study of feature...

متن کامل

Framework for Document Retrieval using Latent Semantic Indexing

Today, with the rapid development of the Internet, textual information is growing rapidly. So document retrieval which aims to find and organize relevant information in text collections is needed. With the availability of large scale inexpensive storage the amount of information stored by organizations will increase. Searching for information and deriving useful facts will become more cumbersom...

متن کامل

Lower Dimensional Representation of Text Data in Vector

Dimension reduction in today's vector space based information retrieval system is essential for improving computational eeciency in handling massive data. In this paper, we propose a mathematical framework for lower dimensional representation of text data in vector space based information retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent ...

متن کامل

Clustered SVD strategies in latent semantic indexing q

The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 65  شماره 

صفحات  -

تاریخ انتشار 2008